CoCoCo: Online Extraction of Russian Multiword Expressions

نویسندگان

  • Mikhail Kopotev
  • Llorenç Escoter
  • Daria Kormacheva
  • Matthew Pierce
  • Lidia Pivovarova
  • Roman Yangarber
چکیده

In the CoCoCo project we develop methods to extract multi-word expressions of various kinds—idioms, multi-word lexemes, collocations, and colligations—and to evaluate their linguistic stability in a common, uniform fashion. In this paper we introduce a Web interface, which provides the user with access to these measures, to query Russian-language corpora. Potential users of these tools include language learners, teachers, and linguists.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MULTILINGUAL MULTIWORD EXPRESSIONS Literature Survey

Multiword Expressions are idiosyncratic word usages of a language which often have noncompositional meaning. The knowledge of multiword expressions is necessary for many NLP tasks like, machine translation, natural language generation, named entity recognition, sentiment analysis etc. In order for other NLP applications to benefit from the knowledge of multiword expressions, they need to be ide...

متن کامل

English-Russian-Finnish Cross-Language Comparison of Phrasal Verb Translation Equivalents

A phraseological expression in a language may have equivalent expressions in other languages with different morpho-syntactic structures and semantic properties. Our recent experience in the Benedict Project (EU IST-2001-34237), in which a Finnish semantic lexicon compatible to the Lancaster English semantic lexicon (Rayson et al., 2004) has been built, shows that there can exist complex cross-l...

متن کامل

A System for Compound Noun Multiword Expression Extraction for Hindi

Compound noun multiword expressions are important for many NLP applications like machine translation and information retrieval. This paper describes a system for Hindi compound noun multiword expressions (MWE) extraction from a given corpus. We identify major categories of compound noun MWEs, based on linguistic and psycholinguistic principles. Our extraction methods use various statistical co-...

متن کامل

What Is At Stake: A Case Study Of Russian Expressions Starting With A Preposition

The paper describes an experiment in detecting a specific type of multiword expressions in Russian, namely expressions starting with a preposition. This covers not only prepositional phrases proper, but also fixed syntactic constructions like v techenie (‘in the course of’). First, we collect lists of such constructions in a corpus of 50 mln words using a simple mechanism that combines statisti...

متن کامل

Automatic Extraction of Fixed Multiword Expressions

Fixed multiword expressions are strings of words which together behave like a single word. This research establishes a method for the automatic extraction of such expressions. Our method involves three stages. In the first, a statistical measure is used to extract candidate bigrams. In the second, we use this list to select occurrences of candidate expressions in a corpus, together with their s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015